5 research outputs found
New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition
This paper presents a novel noise-robust feature
extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual
MVDR spectrum of the filtered short-time autocorrelation
sequence can reduce the effects of residue of the nonstationary
additive noise which remains after filtering the autocorrelation.
To achieve a more robust front-end, we also modify the robust
distortionless constraint of the MVDR spectral estimation method
via revised weighting of the subband power spectrum values
based on the sub-band signal to noise ratios (SNRs), which adjusts
it to the new proposed approach. This new function allows the
components of the input signal at the frequencies least affected by
noise to pass with larger weights and attenuates more effectively
the noisy and undesired components. This modification results
in reduction of the noise residuals of the estimated spectrum
from the filtered autocorrelation sequence, thereby leading to
a more robust algorithm. Our proposed method, when evaluated
on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions
Human Action Recognition in Still Images Using ConViT
Understanding the relationship between different parts of the image plays a
crucial role in many visual recognition tasks. Despite the fact that
Convolutional Neural Networks (CNNs) have demonstrated impressive results in
detecting single objects, they lack the capability to extract the relationship
between various regions of an image, which is a crucial factor in human action
recognition. To address this problem, this paper proposes a new module that
functions like a convolutional layer using Vision Transformer (ViT). The
proposed action recognition model comprises two components: the first part is a
deep convolutional network that extracts high-level spatial features from the
image, and the second component of the model utilizes a Vision Transformer that
extracts the relationship between various regions of the image using the
feature map generated by the CNN output. The proposed model has been evaluated
on the Stanford40 and PASCAL VOC 2012 action datasets and has achieved 95.5%
mAP and 91.5% mAP results, respectively, which are promising compared to other
state-of-the-art methods
New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition
This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions